Benghazi
Language Model Tokenizers Introduce Unfairness Between Languages
Recent language models have shown impressive multilingual performance, even when not explicitly trained for it. Despite this, there are concerns about the quality of their outputs across different languages. In this paper, we show how disparity in the treatment of different languages arises at the tokenization stage, well before a model is even invoked. The same text translated into different languages can have drastically different tok-enization lengths, with differences up to 15 times in some cases. These disparities persist even for tokenizers that are intentionally trained for multilingual support.
- North America > Haiti (0.14)
- Asia > Philippines > Luzon > Ilocos Region > Province of Pangasinan (0.04)
- Europe > Switzerland > Zürich > Zürich (0.04)
- (38 more...)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.70)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.69)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
- Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.68)
- North America > United States > New York (0.04)
- North America > United States > Colorado (0.04)
- North America > United States > Arizona > Pima County (0.04)
- (2 more...)
- Media > News (1.00)
- Government > Regional Government > North America Government > United States Government (0.70)
- North America > United States > Colorado (0.04)
- North America > United States > Arizona > Pima County (0.04)
- Europe > Spain (0.04)
- Africa > Middle East > Libya > Benghazi District > Benghazi (0.04)
- Media > News (1.00)
- Information Technology > Security & Privacy (1.00)
- Government > Regional Government > North America Government > United States Government (0.69)
- Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.48)
- North America > United States > Virginia (0.05)
- Asia > Taiwan (0.05)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- (2 more...)
- Media > News (1.00)
- Government > Regional Government > North America Government > United States Government (0.71)
- Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.49)
- North America > United States > Minnesota (0.05)
- North America > United States > Texas (0.04)
- North America > United States > District of Columbia > Washington (0.04)
- (4 more...)
- Media > News (1.00)
- Leisure & Entertainment > Sports > Football (1.00)
- Government > Regional Government > North America Government > United States Government (1.00)
- Information Technology > Communications > Social Media (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.34)
- North America > United States > California > Santa Clara County > Palo Alto (0.05)
- North America > United States > Washington (0.04)
- North America > United States > Arizona > Pima County (0.04)
- (2 more...)
- Media > News (1.00)
- Government > Regional Government > North America Government > United States Government (1.00)
- North America > United States > Texas > Dallas County > Dallas (0.04)
- North America > United States > Arizona > Pima County (0.04)
- Europe > Spain (0.04)
- Africa > Middle East > Libya > Benghazi District > Benghazi (0.04)
- Media > News (1.00)
- Government > Regional Government > North America Government > United States Government (0.70)
- Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.50)
Data-driven Insights for Informed Decision-Making: Applying LSTM Networks for Robust Electricity Forecasting in Libya
Agaal, Asma, Essgaer, Mansour, Farkash, Hend M., Othman, Zulaiha Ali
Accurate electricity forecasting is crucial for grid stability and energy planning, especially in Benghazi, Libya, where frequent load shedding, generation deficits, and infrastructure limitations persist. This study proposes a data-driven approach to forecast electricity load, generation, and deficits for 2025 using historical data from 2019 (a year marked by instability) and 2023 (a more stable year). Multiple time series models were applied, including ARIMA, seasonal ARIMA, dynamic regression ARIMA, exponential smoothing, extreme gradient boosting, and Long Short-Term Memory (LSTM) neural networks. The dataset was enhanced through missing value imputation, outlier smoothing, and log transformation. Performance was assessed using mean squared error, root mean squared error, mean absolute error, and mean absolute percentage error. LSTM outperformed all other models, showing strong capabilities in modeling non-stationary and seasonal patterns. A key contribution of this work is an optimized LSTM framework that integrates exogenous factors such as temperature and humidity, offering robust performance in forecasting multiple electricity indicators. These results provide practical insights for policymakers and grid operators to enable proactive load management and resource planning in data-scarce, volatile regions.
- North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.71)
- Africa > Middle East > Libya > Benghazi District > Benghazi (0.25)
- Asia > Malaysia (0.04)
- (11 more...)
- Research Report (1.00)
- Overview (1.00)
Computational Linguistics Meets Libyan Dialect: A Study on Dialect Identification
Essgaer, Mansour, Massud, Khamis, Mamlook, Rabia Al, Ghmaid, Najah
This study investigates logistic regression, linear support vector machine, multinomial Naive Bayes, and Bernoulli Naive Bayes for classifying Libyan dialect utterances gathered from Twitter. The dataset used is the QADI corpus, which consists of 540,000 sentences across 18 Arabic dialects. Preprocessing challenges include handling inconsistent orthographic variations and non-standard spellings typical of the Libyan dialect. The chi-square analysis revealed that certain features, such as email mentions and emotion indicators, were not significantly associated with dialect classification and were thus excluded from further analysis. Two main experiments were conducted: (1) evaluating the significance of meta-features extracted from the corpus using the chi-square test and (2) assessing classifier performance using different word and character n-gram representations. The classification experiments showed that Multinomial Naive Bayes (MNB) achieved the highest accuracy of 85.89% and an F1-score of 0.85741 when using a (1,2) word n-gram and (1,5) character n-gram representation. In contrast, Logistic Regression and Linear SVM exhibited slightly lower performance, with maximum accuracies of 84.41% and 84.73%, respectively. Additional evaluation metrics, including log loss, Cohen kappa, and Matthew correlation coefficient, further supported the effectiveness of MNB in this task. The results indicate that carefully selected n-gram representations and classification models play a crucial role in improving the accuracy of Libyan dialect identification. This study provides empirical benchmarks and insights for future research in Arabic dialect NLP applications.
- Asia > Malaysia (0.04)
- Africa > Middle East > Libya > Sabha District > Sabha (0.04)
- North America > United States > Michigan (0.04)
- (7 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > Virginia (0.04)
- North America > United States > Texas (0.04)
- (10 more...)
- Information Technology > Services (0.47)
- Information Technology > Security & Privacy (0.46)
- Information Technology > Communications > Social Media (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
- Information Technology > Data Science > Data Mining (0.88)